The Roche Cancer Genome Database 2.0
Introduction Cancer as a disease is packed full of genome alterations and modifications stemming from a multitude of DNA sequence mutations. Certain mutations can be critical in the development of tumors and metastasis, which can be selectively targeted in chemotherapeutics. The Roche Cancer Genome Database 2.0 (RCGDB), is an integrated biological information system that combines already available human mutation databases. The user friendly graphical interface yields access to data in a variety of ways. Such as; single searches by gene, cell lines or diseases (disease pathways), batch searches of genes and cell lines, customized searches via various filter criteria (1). Background Large amounts of genetic information in cancer has been made readily available in the recent future from new high-throughput techniques. Largely, it is the genes associated with various diseases and cancer that contribute to the mechanisms and development of a certain ailment. Somatic mutations in cancer are currently used clinical studies and applications in targeting them for chemotherapeutics Thus mutation analysis is of high importance in the advancement of our understanding of cancer and other diseases as well as drug discovery and personalized health care (1) (2). Although a plethora of information regarding such mutations is readily available, data is spread out over an increasing number of databases (2) . Researchers today are limited not only by the information available to them, but the absence of unified data also presents itself as a hindrance (3) . The availability of a comprehensive, integrated cancer genome information system could vastly improve our understanding of the relationship between cancer and its mutations (1) . Creation The RCGDB contains an internal data model that can store vast amounts of cancer genome information while utilizing a variety of algorithms within its search function to map external databases. Information on somatic mutations are gathered from COSMIC at the Sanger institute(4), the Cancer Genome Atlas project (5), the IARC TP53 database (6), OMIM (7), KinMutBase (8) and the L1CAM mutation database (8). Congruently with somatic and germline mutations, retrieval of SNPs and their frequency of occurrence are gathered from the international HapMap project (9). Also, amplification data from the database of genomic variants and the NCBI SKY/M-FISH as well as data from the Tumorscape project at the BROAD institute are included in RCGDB (10) (11). Usage and Search Tools Within the RCGDB there are three modes of searches available to its users; simple search, batch search and smart search. *Simple Search - A Google-like interface offers a search for cancer genomic information on genes, samples, cell lines, diseases and pathways. It also contains a special feature that is supported by auto-suggestion allowing in the gene search query for NCBI GeneIDs, names and symbols. A single gene search can yield information regarding somatic and germline mutations, SNPs, amplification data and SKY/M-FISH data. Cell line queries offer all known cancer mutation information on a given sample, provided in table format. One may also search based on histology or tissue type. *Batch Search - This is simply an extension of the simple search, in that multiple queries are made available in the search bar. Users can search multiple cell lines, diseases, tissues as well as much more to observe the potential link (or lack thereof) in mutational analysis format. *Smart Search - These customized interfaces provide a platform for monitoring regularly occurring requests of higher complexity. Presenting data in a highly condensed manner as well as offering information such as wild type vs mutational events in cell lines and tissues. This is useful for users whom have pasted a number of genes or cell lines into the search engine. One may also use smart search for integrative analysis which can yield information regarding statistical significance of a mutation within a given sample. This search function is used when the user declines the "auto-suggestion" feature in simple search. References #http://www.biomedcentral.com/1755-8794/4/43 #Thomas RK, et al. (2007) High-throughput oncogene mutation profiling in human cancer. Nat Genet 39(3):347-351 #Cochrane GR & Galperin MY (2010) The 2010 Nucleic Acids Research Database Issue and online Database Collection: a community of data resources. Nucleic Acids Research 38(suppl 1):D1-D4 #Forbes SA, et al. (2001) The Catalogue of Somatic Mutations in Cancer (COSMIC), in Current Protocols in Human Genetics, (John Wiley & Sons, Inc.) #Collaboration (2008) Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455(7216):1061-1068 #Petitjean A, et al. (2007) Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype: lessons from recent developments in the IARC TP53 database. Hum Mutat 28(6):622-629 #http://www.omim.org/ #Ortutay C, Valiaho J, Stenberg K, & Vihinen M (2005) KinMutBase: a registry of disease-causing mutations in protein kinase domains. Hum Mutat 25(5):435-442 #Frazer KA, et al. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449(7164):851-861 #Iafrate AJ, et al. (2004) Detection of large-scale variation in the human genome. Nat Genet 36(9):949-951 #Beroukhim R, et al. (2010) The landscape of somatic copy-number alteration across human cancers. Nature 463(7283):899-905